AITopics

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.52)

Neural Information Processing SystemsDec-24-2025, 15:03:24 GMT

PSD Representations for Effective Probability Models

Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability while being also compatible with two main operations: multiplications of two models (product rule) and marginalization with respect to a subset of the random variables (sum rule). In this work, we show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end. In particular, we characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees. Moreover, we show that we can perform efficiently both sum and product rule in closed form via matrix operations, enjoying the same versatility of mixture models. Our results open the way to applications of PSD models to density estimation, decision theory, and inference.

effective probability model, name change, psd representation, (6 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.61)

Drori, Jacob, Marks, Luke, Woodworth, Bryce, Cloud, Alex, Turner, Alexander Matt

Output Supervision Can Obfuscate the Chain of Thought

arXiv.org Artificial IntelligenceNov-18-2025

OpenAI (2025a) showed that training against a chain of thought (CoT) monitor can cause obfuscated CoTs, which contain bad behavior the monitor cannot detect. They proposed to keep CoTs monitorable by training only against output monitors that do not have access to CoT. We show that such training can still cause obfuscated CoTs via two mechanisms. First, when a model is trained to produce a safe-looking output, that model may generalize to making its CoTs look safe. Second, since later tokens are conditioned on earlier ones, safe-looking CoTs may increase the likelihood of safe outputs, causing safe-looking CoTs to be reinforced. We introduce two mitigations to address these two issues, which achieve a Pareto improvement in terms of monitorability and task performance compared to regular training. Our work implies that preserving CoT monitorabil-ity is more difficult than previously thought; we suggest practical guidelines for AI developers to maintain monitorable CoTs. Reasoning models, which generate a CoT before their final answer, dominate mathematics and coding benchmarks (Guo et al., 2025; OpenAI, 2025b; Qwen Team, 2025). Korbak et al. (2025) argue that AI developers should aim to maintain the ability to detect model misbehavior by analyzing CoTs ("CoT monitorability"), claiming that it offers a "unique opportunity for AI safety". Baker et al. (2025) show that training frontier models against a CoT monitor can cause their CoTs to become obfuscated: unwanted behaviors persist in a way that is undetectable by the monitor. To address this, they recommend against directly optimizing the CoT. However, we show that even training against a monitor that sees only the final output (and not the CoT) can still cause obfuscated CoT. We refer to this effect as feedback spillover, and show that it can occur for two reasons:Figure 1: Feedback spillover: training against an output monitor obfuscates the CoT. 1 To mitigate parametric feedback spillover, we generate the CoT and output using two distinct models, which we refer to as the "Mind" and "Face" respectively, as proposed by Kokotajlo & Demski (2024).

feedback spillover, large language model, machine learning, (21 more...)

2511.11584

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Neural Information Processing SystemsOct-3-2025, 08:03:42 GMT

Supplementary Material: Estimating Fluctuations in Neural Representations of Uncertain Environments

In the framework specified in section 2.2, we use a first-order Markov chain with two states as our Figure S1: For four different cells, the posterior distribution function is computed and depicted. Here, we concentrate only on trials within original environments, where we know the correct environment and hence can assess how well is the decoding. In this approach, instead of using a state-space structure, we use the likelihoods given by Eq. (1) of Each plot shows a histogram of the average probability (over time) of correctly decoding the trials within unambiguous environments. Fig. S3 shows the decoded environment for a few sample trials based on the neural activity of the whole population. In some trials (e. g. trials 65 & 25) we observe few fluctuations, while in other In Eq. 6, we use a history dependent, gamma-distributed generalized linear model with identity link.

algorithm, hypothesis, probability, (11 more...)

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.52)

Neural Information Processing SystemsJan-18-2025, 08:29:40 GMT

PSD Representations for Effective Probability Models

Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability while being also compatible with two main operations: multiplications of two models (product rule) and marginalization with respect to a subset of the random variables (sum rule). In this work, we show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end. In particular, we characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees. Moreover, we show that we can perform efficiently both sum and product rule in closed form via matrix operations, enjoying the same versatility of mixture models.

effective probability model, product rule, psd representation, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.40)

arXiv.org Artificial IntelligenceAug-20-2024

Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method

Li, Chenyang, Song, Zhao, Xu, Zhaoxing, Yin, Junze

Leverage scores have become essential in statistics and machine learning, aiding regression analysis, randomized matrix computations, and various other tasks. This paper delves into the inverse problem, aiming to recover the intrinsic model parameters given the leverage scores gradient. This endeavor not only enriches the theoretical understanding of models trained with leverage score techniques but also has substantial implications for data privacy and adversarial security. We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$. An innovative iterative algorithm is introduced for the approximate resolution of the regularized least squares problem stated as $\min_{x \in \mathbb{R}^d} 0.5 \|g(x) - c\|_2^2 + 0.5\|\mathrm{diag}(w)Ax\|_2^2$. Our algorithm employs subsampled leverage score distributions to compute an approximate Hessian in each iteration, under standard assumptions, considerably mitigating the time complexity. Given that a total of $T = \log(\| x_0 - x^* \|_2/ \epsilon)$ iterations are required, the cost per iteration is optimized to the order of $O( (\mathrm{nnz}(A) + d^{\omega} ) \cdot \mathrm{poly}(\log(n/\delta))$, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.

diag, second step follow, step follow, (14 more...)

2408.11267

Country:

North America > United States > Virginia (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Tsingalis, Ioannis, Korgialas, Christos, Kotropoulos, Constantine

Camera Model Identification Using Audio and Visual Content from Videos

arXiv.org Artificial IntelligenceJun-25-2024

The identification of device brands and models plays a pivotal role in the realm of multimedia forensic applications. This paper presents a framework capable of identifying devices using audio, visual content, or a fusion of them. The fusion of visual and audio content occurs later by applying two fundamental fusion rules: the product and the sum. The device identification problem is tackled as a classification one by leveraging Convolutional Neural Networks. Experimental evaluation illustrates that the proposed framework exhibits promising classification performance when independently using audio or visual content. Furthermore, although the fusion results don't consistently surpass both individual modalities, they demonstrate promising potential for enhancing classification performance. Future research could refine the fusion process to improve classification performance in both modalities consistently. Finally, a statistical significance test is performed for a more in-depth study of the classification results.

identification, video, visual content, (16 more...)

2406.17916

Country: Europe > Greece > Central Macedonia > Thessaloniki (0.04)

Genre: Research Report > Experimental Study (0.70)

Industry: Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Singh, Shubham, Russell, Ryan P., Wensing, Patrick M.

Details of Second-Order Partial Derivatives of Rigid-Body Inverse Dynamics

arXiv.org Artificial IntelligenceJul-29-2023

The details of second-order partial derivatives of rigid-body Inverse/Forward dynamics are provided. Several properties and identities using Spatial Vector Algebra are listed, along with their detailed derivations. The expressions build upon previous work by the author on first-order partial derivatives of inverse dynamics. The first/second-order derivatives are also extended for systems with external forces. Finally, the KKT Forward dynamics and Impact dynamics derivatives are derived.

artificial intelligence, partial derivative, th dof, (13 more...)

2203.00679

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (0.67)

Pöppelbaum, Johannes, Schwung, Andreas

Quaternion Backpropagation

arXiv.org Artificial IntelligenceDec-26-2022

Quaternion valued neural networks experienced rising popularity and interest from researchers in the last years, whereby the derivatives with respect to quaternions needed for optimization are calculated as the sum of the partial derivatives with respect to the real and imaginary parts. However, we can show that product- and chain-rule does not hold with this approach. We solve this by employing the GHRCalculus and derive quaternion backpropagation based on this. Furthermore, we experimentally prove the functionality of the derived quaternion backpropagation.

artificial intelligence, derivative, machine learning, (14 more...)

2212.13082

Country: North America > United States (0.68)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.83)

Wang, Quanlong, Yeung, Richie, Koch, Mark

Differentiating and Integrating ZX Diagrams with Applications to Quantum Machine Learning

arXiv.org Artificial IntelligenceNov-24-2022

ZX-calculus has proved to be a useful tool for quantum technology with a wide range of successful applications. Most of these applications are of an algebraic nature. However, other tasks that involve differentiation and integration remain unreachable with current ZX techniques. Here we elevate ZX to an analytical perspective by realising differentiation and integration entirely within the framework of ZX-calculus. We explicitly illustrate the new analytic framework of ZX-calculus by applying it in context of quantum machine learning for the analysis of barren plateaus.

artificial intelligence, diagram, machine learning, (16 more...)